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Abstract: Upper bounds for rates of convergence of posterior distributions 
associated to Gaussian process priors are obtained by van der Vaart and van 
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^^H ' the Reproducing Kernel Hilbert Space of the Gaussian prior. Here lower- 

bound counterparts are obtained. As a corollary, we obtain the precise 
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•r^ ' 1. Introduction 

m ■ 

C^^ ' In the Bayesian non-parametrics literature, several general results about poste- 

rior consistency (see e.g. [1]) and posterior rates of convergence (see for instance 



(N 

C"^ ' [5; 13]) are now available. Roughly, the rate of convergence of the posterior is 



generally thought of as an e„ as small as possible such that the posterior prob- 
ability of the ball centered at the true /o and of radius e„ still tends to 1 in 
probability. In this context a natural question is. starting from a fixed prior, 
what is the actual rate of convergence of the posterior ? The tools proposed in 
^^ , the cited articles often allow to get an upper bound for this posterior rate. 

^ ' Also, from the practical point of view, non-parametric type priors are now 

commonly used in applications, as an example the book [12] presents applica- 
tions of Gaussian priors in machine learning. In non-parametric situations many 
priors will not lead to optimal rates; in some cases the corresponding posterior 
will still converge at some reasonable rate towards the true parameter or func- 
tion; in other cases the convergence might be extremely slow or consistency 
might even fail. Determining the precise rate of convergence of the posterior 
can then help in choosing the type of prior adapted to the practical situation 
or in adjusting the prior parameters. 

Given a class of functions, upper bounds for the rate are clearly optimal if they 
coincide with the minimax rate of convergence over the class. In the case where 
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the upper bound is slower than the optimal rate, one would like to establish a 
bound from below for the rate. If both upper and lower bounds match, say up 
to some constant or logarithmic factor, then the exact rate of convergence of 
the posterior remains determined. In this paper, the issue of obtaining a lower 
bound for the posterior rate is considered in the case of Gaussian priors. Though 
the focus will be mainly on the class of Gaussian priors, the methodology we 
introduce can be used in some cases to obtain lower bounds for other priors as 
well. In particular, we shall derive a lower bound result for a non-Gaussian prior 
in a specific example. 

The organization of the paper is as follows. In the next section, we enounce 
our main result on lower bounds in a general framework and give its proof. 
This result is applied in Section 3 to obtain lower bounds in two nonparametric 
models: the Gaussian white noise model and the problem of density estimation, 
with respectively Gaussian series priors and Riemann-Liouville priors. For the 
latter prior, upper bounds are also obtained which extend previous results of 
[14]. Technical results are gathered in Section 4. Concluding remarks are given 
in Section 5. 

Let us introduce some notation. For any real numbers a, b, we denote by 
a A b their minimum and by a V 6 their maximum. We define Hellinger's dis- 
tance h{f, g) between two probability densities / and g by the I/^-distance be- 
tween the root densities -// and ^. Let K{f, g) = J f \og{f/g)dfi stand for 
the KuUback-Leibler divergence between the two non-negative densities / and 
g relative to a measure /x. Furthermore, we define the additional discrepancy 
measure V2{f,g) = J f\log{f/g) — K{f,g)\'^diJ.. Let i^[0, 1] be the space of 
square integrable functions on the interval [0,1], equipped with the L^-norm 
II/II2 = (/n f^djiY^^. Let C°[0, 1] denote the space of continuous functions on 
[0, 1] equipped with the supremum norm || • ||oo- Let C^[0, 1] denote the Holder 
space of order /3 of all continuous functions / that have (3 continuous derivatives, 

for P the largest integer strictly smaller than /?, with the /3th derivative f^-' 
being Lipshitz-continuous of order f3 — p. This means that there exists a positive 
constant C, which might depend on the function /, such that 

\fiB{y) _ /(«(x)| < C\y - xf-^ V x,2/ e [0, 1]. 

2. Lower bound result 

Let {X'^"-\A^"\Pf ; f € JF) be a sequence of statistical experiments with ob- 
servations X '•"■', where the parameter set JT is a subset of a Banach space B (for 
instance L'^[0, 1] or C"[0, 1]) and n is an indexing parameter, usually the sample 
size. We put a prior distribution 11 on /. In this paper we consider the case 
where the prior is the law of a Gaussian process taking almost surely its values 
in B (see below). We are interested in properties of the posterior distribution 
n(-|X(")) under p)"\ where /o is the "true" function. We denote by Eq the 
expectation under the latter distribution. For any sequence e > let us define 
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a Kullback-Lciblcr neighborhood of /o as 

In this work Gaussian processes Z are supposed to be centered and tight 
measurable random maps in the Banach space (B, || • j|). We refer to [15] for 
an overview of basic properties of these objects. Let H be the Reproducing 
Kernel Hilbert Space (RKHS) of the covariance kernel of the process. We will 
generally assume that /o belongs to the support of the prior in B, which for 
Gaussian process priors is nothing but the closure of H in B, see for instance 
[15], Lemma 5.1. 

First let us review the key upper-bound results obtained in [14], where the 
authors show that for Gaussian priors an upper-bound for the concentration rate 
of the posterior can often be obtained in a simple way from the so-called con- 
centration function of the Gaussian process. This quantity is defined as follows. 
For any e > 0, let 

VP/„(£)= inf ||/.||^-logP(||Z||<e) (1) 

/ieH:||/l-/o||<£ 

Assume that the norm || • || on B is comparable to a metric d appropriate to 
the statistical problem (often, d is a distance for which certain tests exists, 
which allows to apply the theory presented in [5]; for instance, in i.i.d. settings, 
one might choose Hellinger's distance). Here "comparable" means that the ball 
{f G T, 11/ — /oil < £«} should be included in the ball for d around /o of radius 
ce„ and also in the KuUback-Leibler neighborhood BklHo, cEn) defined above, 
for some c > 0. The authors in [1 4] prove that if e„ ^ satisfies 

iPfoi£n)<nel, (2) 

then the posterior contracts at the rate e„ for the distance d, in that for large 
enough M > 0, Eon(/ : d{f, /o) < Msn \ X(")) ^ 1 as n ^ oo. 

These results mean that for Gaussian priors an upper-bound on the rate of 
the posterior is obtained as soon as the next two quantities are controlled 

^iie)= mi \\h\\l ^^ie)^-\ogP{\\Z\\<e). (3) 

The first term measures how well elements in the RKHS H of the Gaussian 
process can approximate the true function. Note in particular that if /o happens 
to be in H, this term simply remains bounded. The second term, which does 
not depend on /g, is the so-called small ball probability of the Gaussian process. 
Small ball probabilities have been studied in many papers in the probability 
literature and precise equivalents as e — ^ of <f^{£) are available for many 
classes of Gaussian processes, see for instance [11]. Yet at first sight it is not 
obvious to see why the concentration function Lpf„ should appear in the study 
of posterior rates. Lemma 2 below answers, at least partially, this question. 

Let us conclude the overview of upper-bound results with an example. In 
a context of density estimation, if one chooses Brownian motion as prior on 
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continuous functions, the rate e„ depends on the Holder regularity (3 of the true 
/o as follows. If /3 > 1/2, then £„ can be chosen equal to n^^/^, whereas if 
/? < 1/2 the rate e„ must be in n^^^^ to satisfy (2), see Section 4.1 in [14] or 
Theorem 3 below. Thus, up to constants, the rate is optimal in the minimax 
sense if /3 = 1/2. However, for all other values of /3, the obtained rate is below 
the minimax rate which is n~'^/(2/9+i)^ Thus it is natural to ask whether the 
rate of concentration for Brownian motion is really the one described above or 
if in fact the posterior contracts faster. 

Let us now define a notion of lower bound for a given prior H on J-. Let d 
be a distance on the parameter space. We say that the rate C„ is a lower bound 
for the concentration rate of the posterior distribution H(-|X("') in terms of d 
if, as n ^ +00, 

EoH(/: d{fJo)<Cn |X("))^0. (4) 

This mainly means that (^„ is too fast for the posterior measure to capture 
mass in the ball of radius Cn around /q. The aim is then to prove that such a 
result holds for (n as large as possible. In the sequel, we will be able to prove 
in some examples that the posterior puts asymptotically all its mass inside a 
ring of the type {/ g JF, mnEn < rf(/, /o) ^ £«}, for m„ either a small enough 
constant or slowly decreasing (e.g. of logarithmic order), see Section 3. Note also 
that a lower bound in the sense of Definition (4) will not be an upper-bound 
for the same distance, so our definition is, in a way, in a strict sense. But it 
seems to us to be the most natural one, for symmetry reasons with respect to 
upper-bound definitions, and also in view of the aforementioned 'ring'-behavior. 
It would also be interesting to be even more precise about the behavior of the 
posterior: for instance, if asymptotically the posterior sits on a ring for a distance 
d, to see how the mass is distributed inside this ring. However, the presently 
available techniques, including the ones of this paper, give only results up to 
constants, so this would probably require introducing new techniques or refining 
the mentioned ones. 

Theorem 1 below establishes a lower bound for the concentration rate of the 
posterior n(-|X(")) for Gaussian priors in terms of the norm || • || of the Banach 
space. Its proof relies on two basic ideas. The first one is that, roughly, if the 
prior probability puts very little mass (in some sense) on a certain measurable 
set, then the posterior probability of this set is also small. The following lemma 
is Lemma 1 in [6] (see also Lemma 5 in [1]). 

Lemma 1. //a„ -^ and na\ — > -t-00 and if Bn is a measurable set such that 

H(B„)/H(BxL(/o,an)) < e-2""", 
then EoH(B„ | X(")) -> as n ^ +00. 

The second ingredient is a general result about Gaussian priors which gives 
control from above and below of non-centered small ball probabilities associated 
to the process in terms of ip. For a proof, see for instance [9] or [l-'j], Lemma 5.3. 
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Lemma 2. Let Z be a Gaussian process in B with associated RKHS M. Assume 
that /o belongs to the support of Z in B. Then for any £ > 0, 

^j„{e) < -logP(||Z - /oil < £) < ^/o(£/2). 

In view of this result, it seems natural to see the concentration function 
iffg appear when studying rates of contraction for Gaussian processes, since 
the latter function gives a direct control on how much mass the prior puts on 
neighborhoods of the true function. The following lemma now states some useful 
properties of ipf^ . In particular, it implies that this function has an inverse (pj . 

Lemma 3. Let Z be a non-degenerate centered Gaussian process in (B, || • ||). 
For any /o in B, the associated concentration function e — > ipf^ (e) is strictly 
decreasing and convex on (0, +oo). In particular, it is continuous on (0,+oo). 

This lemma is proved in Section 4. We can now state our first Theorem. 

Theorem 1. Let Z be a Gaussian process with associated distribution 11 on the 
space (B, || • ||). Let the data X^"'' be generated according to Pfg and assume that 
/o belongs to the support of II in M. Let q;„ — > such that na^^ — > +oo and 
n(i?KL(/o, Ctrl)) > exp(— cna^) for some c > 0. Suppose that (^„ — > is such 
that (pfg{Cn) > (2 + c)na'^. Then, as n ^ +oo, 

Eon(i|/-/ol|<Cn |x("))^o. 

Proof. Due to Lemma 2, it holds n(||/ — /o|| < Cn) < cxp(—iy9/o ((■«))• Combining 
this with the assumption on the KL-type neighborhood, one gets that 

n(||/-/oll<Cn) /_, ^_ ,, , , „„ ,, 



n(SKL(/o,a„) 



< exp(-(^/o(C„) + cna„). 



The assumption on <^/o(Cn) ensures that the last display is further bounded 
from above by exp(— 2na^). An application of Lemma 1 with the choice of set 
S„ = {/ G ^, 11/ - /oil < Cn} leads to Eon(B„| X(")) ^0. D 

Before commenting on this result, let us state a direct consequence of it. 
If the rate e„ satisfies (2) and if the norm || • || combines correctly with the 
KuUback-Leibler divergence, so that for some d > 0, it holds 

n(BKL(/o,rfe„)) > n(||/ - /oil < 2e„), 

see Section 3 or [14] for some examples, then due to Lemma 2 we obtain that 
n(i?KL(/o:C'£n)) > exp(— ne^). Hence according to Theorem 1, 

is a lower bound for the rate of convergence. 
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Furthermore, iiififo is "nicely varying" (see below, this depending of course on 
the particular function /o), then one expects to be able to chose Cn of about the 
same order as £„ (e.g. C„ = e„/logn or even Cn = £n/K for K large enough). 
For instance, if (p7 is of regular variation in the neighborhood of +00, then 
Cn(/o) is at least Sn/K, for some K large enough. 

Thus we complement the result of [14], where the upper bound part was 
obtained, by proving a lower bound counterpart. Note also that interestingly, to 
prove Theorem 1, just the lower bound of Lemma 2 is used. By contrast, note 
that the main ingredients of the proof of the upper bound in [14] are Borell's 
inequality and the upper bound of Lemma 2. Note also that the assumptions of 
Theorem 1 are mainly in terms of the prior, the model coming in only through 
the KuUback-Leibler neighborhood. 

As stated Theorem 1 can be used for Gaussian priors only. However, it illus- 
trates well how the simple Lemma 1 can successfully be applied to obtain lower 
bounds for the concentration rate. In fact, when dealing with general priors, one 
can try to apply Lemma 1 directly. This idea enables us to obtain a lower bound 
result for a non-Gaussian prior (though constructed from Gaussian priors) fur- 
ther in this paper, see Theorem 3. This approach seems to be useful to get lower 
bounds for general priors in other contexts as well. Further contributions on this 
question are in preparation and should be available soon. 

Another interesting question is how to get more explicit estimates of the rates 
£n and C„ in terms of the class of functions the true /o belongs to and of the 
"regularity" a of the process in some sense (for Brownian motion and Holder 
classes we would have a = 1/2). In the next section, we address this question 
in some simple cases. 

3. Applications 

3.1. The L^ -setting and Gaussian series priors 

Let {£fc}A:>i be an orthonormal system in L^[0, 1], being chosen for simplicity 
equal to the trigonometric basis ei = 1 and for fc > 1, S2k{) ~ cos(27rfc-) and 
£2fc-i-i(') = sin(27rfc-). The Sobolcv ball Tjsx of order /3 > and radius L > is 

^/3X = {./ e L'[0, 1], / = ^ hsk and ^ k^^ fl < L^}. 

k>l fc>l 

Gaussian series priors. Let {afc}fc>i be a sequence of independent standard 
normal random variables and let {crfe}fc>i be some square-integrable sequence 
of real numbers. For simplicity let us choose a^ = k~-^/^~" for some a > 0. Let 
us define H as the probability distribution generated by 

+ OC 

^"(■) = X! <^kak£k{-)- (5) 

fc=l 

This defines a process with sample paths in B = L'^[0,l]. The RKHS H" of 
Xa in B is H" = {J2k>i hk<Jk£k, {hk)k>i e f}, equipped with the norm 
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\\J2k>i^k<^k£k\\M^ = J2k>i^t^ ^'^^ f^"" instance [I'l], Theorem 4.2. Since the 
support of the process in L^ is then the closure of H" in L^ , it is easy to check 
that the support is in fact L^ itself. Furthermore, the small ball probabilities 
(fi^ for this process have a well-known behavior, that is — logP(||XQ,|J2 < e) is 
of the order of e^^/" as e -^ 0, see for instance [8], Theorem 4. 

Gaussian white noise model. To simplify the formulation of the upper-bound 
results, we will assume that we are in a particularly simple model, namely the 
Gaussian white model. In this model the data X'^") is given by 

dX(")(i) = f(t)dt + -^dW{t), t e [0, 1], (6) 

for some / in L^[0, 1] and W standard Brownian motion. Let us denote, for any 
positive real numbers a and /3, 



^a-.P A 



n-^^. (7) 



In the sequel the notation < is used for "smaller than or equal to a universal 
constant" and > is defined similarly. 

Theorem 2. Let (3 > 0, L > and suppose the data is generated according to 
(6). Let the prior process be defined by (5) with a > 0. Let /o be in Tp^L cind 
let the rate r^'^ be defined by (7). Let £„ and C,n be such that 

ffoi^n) < nel and Cn < fj^^i^nel). 

Then for M large enough, 

Eon(C„ < 11/ - /0II2 < Me„ I X(")) ^ 1, 

as n -^ +00. For any /o in Tp^L, one can choose En such that e„ < r"'^ and, 
if 01, < (3, one can choose C,n such that C,n ^ fn'^ ■ Furthermore, if (3 < a, there 
exists /o in Tp,h such that, for p > 1 + [3/2 and M large enough, as n —> -|-ck), 

Eon(C'3log-^n < II/-/0II2 < MC' I X(")) ^ 1. 

The first convergence result is essentially a consequence of Theorem 3.4 in 
[14] for the upper-bound and of Theorem 1 for the lower bound. The second 
part of the statement reveals that there are indeed functions in the class such 
that the posterior rate is r^''^, up to a log- factor ii (3 < a. In this sense the rate 
can be said to be optimal (up to a log- factor) over Tp.L- 

It is interesting to compare these results to the ones obtained by [17] and 
[2], where the authors also study estimation in model (6) from the Bayesian 
perspective. Both works obtain the upper-bound result on e„ for priors defined 
by (5) by different methods but they do not consider the question of optimality 
of the rate r^'^ when a 7^ /3. In [2], the focus is on the question of adaptation 
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when one puts also a prior on a and the authors obtain the minimax rate for 
the resulting prior for unknown /3 under some conditions. In [17], an interesting 
observation about non-optimality is made, but in a rather different direction 
than ours, the author noting that though the prior (5) leads to the minimax 
rate for a = f3, both the prior and the posterior put mass zero on the Sobolev 
space {/ = (/fc)fe>i, J2k>i ^^^fk < +00} the true function /o belongs to. 

Remark 1. If a < /3, the precise rate of convergence of the posterior is, up to 
constants, equal to r^'" = ri~"/'^"+^^ If a > /?, more information on /g (for 
instance about the rate of decrease of its Fourier coefficients) is needed to evalu- 
ate the RKHS-approximation term and eventually obtain an explicit expression 
of the rate, see for example the special "worst-case" function /o considered in 
the proof of the theorem. 

Remark 2. It is natural to ask whether it is possible to avoid the log-factor for 
the lower bound. The answer is yes if one allows sequences of functions: it can 
be checked that there exists a sequence /o,n hi Tp^L, where the function /o,n 
has only one properly chosen non-zero Fourier coefficient, such that, for M large 
enough, E/„ „n(r^'^/M < ||/ - /o,„||2 | X^"^) tends to 1 as n -^ +00. 

Proof of Theorem 2. The fact that the posterior concentrates in a ball of radius 
Msn for the || • ||2-norm is the conclusion of Theorem 3.4 in [14]. The explicit 
upper-bound for £„ is obtained as follows. Denoting fK = X]fc=i fo.k^ki'), note 
that fK belongs to H". Since /o belongs to Tpx^ it holds 

WfK-hWl = E flp<K-'' E p''flr><L'K-'' 

p>K+l p>K+l 

p—1 p—1 

Let us now choose K = e„ . The last display then implies that the approxi- 
mation part (p4 (sn) of the concentration function is at most e„ 
On the other hand, the small ball probability (/3^(e„) is at most constant times 
En for n large enough as noted at the beginning of this Section. Hence 

^, Cp ^ < p-l/a , p-(l+2a-2/3)//3AO 
fto\^n) r^ i-n ^ ^n 

If we choose ne^ equal to the latter quantity we get e„ < 7j-"^'3/(2"+i) = rj^'^. 

To obtain the lower bound result, we apply Theorem 1. Simple calculations 
reveal that for model (6), the set BKLifo, e) coincides with {/, ||/ — /0II2 < e}, 
see Lemma 6 in [(i] and thus U{BKL{fo,'2en)) = n(||/ — /0II2 < 2e„). Now apply 
the remark after Theorem 1 to obtain that if ipfoisn) < "-e^i, then any („ such 
that ipfoiCn) > 9n£^ is a lower bound for the rate. To obtain a more explicit 
form for (^„, we distinguish the cases a < (3 and (3 < a. 

In the case a < /?, let us use the fact that 

^/o(C«)>-iogn(l|/l|2<Cn)>C7'/", 
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where the last inequahty is obtained using the asymptotics of the sniaU baU 
probabihty of Xa- Thus the condition v/o(Cri) > S^^^n is satisfied if (n is equal 
to constant times n~"/^^"+^^ = r"'", since e„ can be chosen equal to constant 
times r"-'^ = r"'". 

In the case a > (3, let us define /o by specifying its Fourier coefficients as 

fo.k = fci/'+''(l + logA:)i/2loglogfc, (fc > I). 

Note that the series ^ k"^^ Iq k converges so without loss of generality one can 
assume that /o belongs to !Fp^L (otherwise consider a/o for a > small enough). 
Moreover, one just needs to prove the lower bound result, the upper-bound 
resulting from what precedes. In the remainder of the proof the rate e„ is thus 
taken equal to Cr"'^ for some constant C > 0. 

Let us denote Cn = S„e„, where J„ -^- is to be chosen, and let us bound 
from below <p/o(C«)- We have (ffoiCn) > vJS^n)- Let h be in the RKHS H" of 
the prior with \\h — /0II2 < Cn- Then, for any k{n) > 1, using the inequality 
{x + yY > x^/2 — j/^ valid for all reals x and y, 

k>l k^l 

k(n) /c(n) 



> ^Efc'+'"/o%-E^'"*''"(^'=-/o 



fe=l fc=l 



That is, with the notation S{K) = X]a:=i ^"^^^"/o fc; using that \\h — /0II2 < Cm 

\\h\\l>\s{k{n))-k{nf+^'^(:l (8) 

Let us choose k{n) = n^/'^"*"^"' log n and i5„ = log~^ n for some p > 0. Using the 
explicit form of the fo.kS, one obtains, denoting In — loglogn, that S{k{n)) > 

fc(n)l+2a-2/3;-2lQg-l^^ Thus 

S{k{n)) > ne2;-2i„g2a-2/3^ 
Mn)i+2"C^ = n4log2"+i-2Pn. 

Since a > f3, the first of these two terms is of larger order than ne^- As soon as 
2p > 1 + 2/3, it is also of larger order than the last term in the preceding display. 
Minimizing Equation (8) in /i, we conclude that in this case. 

Thus ipfg(C,n) divided by ne^ tends to infinity. In view of the Remark after 
Theorem 1, we obtain that Cn = SnSn is a lower bound for the rate, which 
concludes the proof. D 
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3.2. The C° [0,1] -setting and Riemann-Liouville type priors 

In this subsection we obtain new upper and lower bounds for posterior rates in 
the foUowing model of density estimation. The observations Xi , . . . , X„ are a 
random sample from a positive density /o on the interval [0,1]. Let us denote 
Wo = log/o, so that /o = 6™°. 

Now let us explain how we construct a prior on positive densities /, fol- 
lowing the approach considered in [14]. To any continuous function if on the 
interval [0, 1], we associate the density p^ (that is a nonnegative function which 
integrates to 1) defined by 

g"'(s) 

Let VF be a Gaussian process defining a prior !!„ on C°[0, 1]. Then the quantity 
Pw defines a random (non-Gaussian) density. The corresponding prior on the 
set of densities is denoted by Hp^ . As Gaussian prior W we choose the process 
X" defined below. 

First, let us define the Riemann-Liouville process of parameter a > as 

R't= f\t-sr-^/^dB{s), te[0,l], (9) 

Jo 

where B is standard Brownian motion. Then the process prior, which we call 
the Riemann-Liouville type process (RL-type process), is defined as 

a+l 

X^ = Rf + Y,Zkt'', tG[0,l], 

fc=0 

where Zq, . . . , Za+i, Rt are independent, Zi is standard normal and Rf is the 
Riemann-Liouville process of parameter a. Note that if a = 1/2 then Rf is 
simply standard Brownian motion and if {a} = 1/2, with {a} G [0, 1) the 
integer part of a, then Rf is a fc-fold integrated Brownian motion. It can be 
checked that the support in C"[0, 1] of X" is the whole space C°[0, 1] (it is in 
fact in order to get the whole space as support that one adds the polynomial 
part), see [14], Section 4 and Theorem 4.3. 

Let us denote by (pwo the concentration function associated to the process 
X" and the continuous function wq. Upper-bounds on (fwQ used in the proof of 
the next Theorem to get explicit upper bound rates are obtained in Section 4.1. 

Theorem 3. Suppose that wq ~ log/o belongs to the Holder class ^^[0, 1] for 
some P > and let the prior on densities be the distribution Hp^ oj px-^, where 
X" is a Riemann-Liouville type process of parameter a > 0. Then there exist 
finite constants Ci , C2 > such that, if e„ and (^„ are such that 

'Pwoi^n) < nel and Cn < C'i(p;^^^(C2ne,^J, 
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then for M large enough, as n —* +00, 

Eonp„(/i(/,/o)<A/e„ |X(")) ^ 1, 
EonpJ||/-/o||oo>Cn 1^^"^) ^ 1, 

where h is Hellinger's distance. Moreover, one can choose e„ such that En S ''n 
if {a} = 1/2 or a does not belong to P + 1/2 + N and e„ < n^'^/^^^+i) logn 
otherwise. 

These results describe in a rather complete way the rate of convergence of 
the posterior for the prior Tlp^ constructed from the Riemann-Liouville prior, 
for all values of the parameter a in (0, +00). Also, from the upper-bounds point 
of view, it improves on Theorem 4.3 in [14], where a = (3 is needed. 

Note that, while upper-bound rates are obtained for Hellinger's distance, the 
lower bounds are in terms of the uniform norm. To obtain the lower bounds, 
the uniform norm is in a way the most natural (and easiest) distance to work 
with since it is the norm on the Banach space where the prior lives and in 
the proof, the idea will be indeed to apply Lemma 1 with sets of the form 
Bn = {/, 11/ — /oiloo < Cn}- For uppcr-bouuds, Hellinger's distance is a rather 
natural choice since it is a natural testing distance for i.i.d. observations in view 
of the theory of [5] . A natural extension of our results would be to obtain results 
in terms of a common distance on the parameter space. While such a refinement 
is beyond the scope of the present contribution, we hope that future papers will 
answer this type of question. 

In the above Theorem, explicit bounds for e„ are obtained using explicit 
upper-bounds for the concentration functions obtained in Section 4.1. It should 
also be possible to obtain explicit bounds for (n in the spirit of those of Theo- 
rem 2 by bounding the concentration function from below. One difficulty here 
with respect to Theorem 2 is the presence of the extra polynomial part in the 
definition of the process, which makes the evaluations even for the small ball 
term more difficult. We will not further discuss this issue here but note only 
that in some simple cases, an explicit expression for C„ follows quite directly 
from what precedes. 

Remark 3. For Brownian motion released at zero Xt ~ Bt + Zq, hy a. slight 
adaptation of the preceding, one can obtain an explicit evaluation of C„ and 
show that if /o is smooth enough, more precisely if /? > 1/2, there exist a 
constant m such that, as n —^ +00, 

np„(||/-/o||oo>mn-i/4|x("))^l. 

Note that Xt is almost the RL-type process with a ~ 1/2 except for the term 
tZi. It can still be checked that the support in C*'[0, 1] of this process is the full 
space C°[0, 1], see [14], Theorem 4.1, and that Theorem 3 still holds, following 
the same proof. We always have V'/o(£) ^ f^i^) = ~ logP(||X||oo < £)• But 
on the event that ||Ar||oo < e, we have \Xo\ = \Zq\ < e thus ||i?||oo < 2e. 
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Using the behavior of the small ball probability of Brownian motion, we obtain 
that there exists a constant C such that (^/o(e) > Ce~^ for e small enough 
and thus fl^{u) > u~^^^. Hence ^„ can be chosen equal to constant times 
(ne^)~^/^, where £„ is the upper-bound rate obtained for Xf. By the smoothness 
assumption on /o, the rate e„ can be chosen equal to a constant times rj ' = 
n~^'^, which yields the announced result on C„. 

The following Lemmas are used in the proof of Theorem 3. The proof of 
Lemma 4 can be found in Section 4.2. 

Lemma 4. Let ip^ denote the concentration function associated to the process 
Xf and the function w G ^[0, 1] and let p denote both the real p and the constant 
function equal to p. Then for any £ > 0, 

Vn,o+p{e) > ^n^oi^) + P' - 2(|u.o(0)| + £)|p|. 

Lemma 5 (Lemma 3.1 in [14]). For any v,w elements o/C"[0, 1], 

h{p^,P^) < |K'-w;||ooell''-'"ll°°/' 
K{p^,p^)VV{p,,p^) < |K>-u>||^ell''-™ll°°/2(l + ||z;-u;|U)2. 

Proof of Theorem 3. The fact that any £„ such that fwai^n) < ne\ is an upper 
bound for the rate is Theorem 3.1 in [14]. Now Theorem 4 in the Appendix 
enables to get the explicit expression of £„ in terms of r"'^. 

To obtain the lower bound result, we show that if Qn ~ C!^'^(p~^^{Cine'^) for 
large enough constants d and C3 then IIp^, (||/ — /o||oo < Cn) ^ exp(— 3n£^). 
This is enough to obtain the lower bound statement, since then one can ap- 
ply Lemma 1 with Bn = {f, ||/ ^ /o||oo < C"}- The prior probability on the 
KuUback-Lciblcr type neighborhood is bounded from below using Lemma 5 to 
obtain a neighborhood in terms of the || • ||oo-norm, and finally, due the fact that 
the support of X" in C'^[0, 1] is the whole space C°[0, 1], Lemma 2 can be used. 

Let An be the set {w g C°[0,1], \\pw — /oiloo < Cn}- Since C?i -^ and 
/o > P > for some p > 0, it holds 2||/o||oo > Pw > p/2 > on A„ for n large 
enough. Since the logarithm is a Lipshitz function on the interval [p/2, 2||/o||oo], 
one gets, on An, for some d > 0, 

II logp^ - log /oil 00 < d.\\p^, - /oil 00 < dCn- 

Noting that 

||logp„-log/o||oc = II log -j^^ - wolloo = ||w- wo-log / e^'Hoo, 

one obtains that, on An, it holds ||w — wq^^ujIIoo < d(n, where Z^ is a constant 
function and |^u,| < ||w||oo- We shall use the fact that with high probability, 
this value is not too large. Note that if w is in An and ||w||oo < Cy/nSn then w 
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belongs to U^^_^Bk, where Bk = {w, ||u;- wo -Cfc||oo < 2dCn} with Ck = kd^n 
and N the smallest integer larger than C^^en/{dCn)- Thus 

np„,(il/-/o||oo < C«) = n^dlPu, -/oiloo < Cn) 

< ^ n.!„(||w-u;o -Cfejioo < 2dCn) + n^i«(||w||oo > CVn^En). 

It is an easy consequence of Borell's inequality, see ['■')] or [IG], Proposition A. 2.1, 
that nt„(||u;||oo > C^pnSn) is bounded above by exp(— 4ne^) for C large enough. 
Now due to Lemma 2, 

n„,(||w - Wo - Cfclloo < 2(iC„) < exp(-(^^g+cfc(2dC„)). 

Let I\ be the set of indexes k such that |cfe| < 4|i(;o(0)| and Ii the set of indexes 
such that |cfc| > 4|wo(0)|. According to Lemma 4, we have for n large enough 

r ^^„(2dC„)-9|wo(0)P, if fce/i, 

</5«,o+cj2dC«) > < 

[ ^„„(2dCn)+4/2, if fee ^2. 

Thus for some C4 > 0, it holds 

npJ||/-/o||oo<Cn) 

C^^<|fc|<Af 

Using the behavior of the small ball probability for the process at stake, we have 
that (/?„,(, {2d(n) ^ Cn hence for n large enough it holds ip^^ (2dC„) +2 log Cn > 
(/?u,o(2d(^„)/2. Thus the last display is bounded from above by 2C4 exp(— 4ne^) 
as soon as ipwg{2dCn) > 8ne^, which concludes the proof. D 

4. Appendix 

4.1- Concentration function of RL- type processes: upper bounds 

In this subsection, we establish an upper-bound result on the concentration 
function of the RL-type process which is of independent interest and which is 
used in the proof of Theorem 3 to get explicit upper bound rates. 

First let us introduce the classical notion of fractional integral, whose defini- 
tion is as follows. For a > and / a continuous function on [0, 1], the fractional 
integral of order a is defined as 

lo+fit)^ f it-sr-'f{s)ds, 

Jo 
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for any t in [0, 1]. If t < 0, wc set /^+/(t) = 0. 

We shall use the following two Lemmas, the second enabling to handle a 
case discarded by the first one (namely the case where a + X = 1). In the next 
statement, the symbol * stands for the usual convolution between functions. 

Lemma 6 (Lemma 5.2 in [14]). Let A € [0, 1] and a € [0, 1) be such that a + X G 
(0, 2) and a + X ^ 1. // / S C'^[0, 1] and g S ii(K) has compact support and 
satisfies J g{u)du = and, in the case that a + A > 1, also J ug{u)du, then 

||/o"+(/*.9)lloo< I \ur+^\g{u)\du. 



Lemma 7. Let 5 e (0, 1) and f G C^[Q, 1]. If g £ ii(M) has compact support and 
satisfies J g{u)du ~ then 

ll^+'(/ * 9)\\l < / "'{1 + log'(l + \u\-')}g{ufdu. 

The proof of Lemma 7 can be found in Section 4.2. 

Theorem 4. Suppose /g belongs toC^[0, 1], with (3 > 0. The concentration func- 
tion ipfg associated to the process X^ satisfies, if < a < (3, that ipf^, (e) = 
0{e~^'°') as e — > 0. In the case that a > (3, as e ^ 0, 

f Oie-"^^^) if W-1/2 or „ ^ /? + i + N, 

I 0{e 3 log(l/e)) otherwise. 

This extends Theorem 4.3 in [1-.J in the case that a ^ f3. There is an extra 
difficulty in the case where a — (3— 1/2 is an integer and {a} is not 1/2, resulting 
in the presence of the extra log-factor. Roughly, the difficulty arises from the 
fact that, if a G (0,1) and A G [0,1], the fractional integral Iq_^_ does map 
C^[0, 1] ~* C'^+"[0, 1] only if a + A 7^ 1, see [7]. Lemma 7 enables us to deal with 
the case where a + A = 1 is an integer. 

Proof of Theorem 4- Let us denote by Z = X" — R"' the polynomial part of X" 
and by H" the RKHS of i?". The proof is quite similar to the one of Theorem 4.3 
in [14] and the starting point is identical. Using Theorem 2.3 in [14], the initial 
step of the proof is to bound from above the concentration function ipfg {2s) by 
a multiple of the sum (pfg-p{e/2, R") + (pp{e/2, Z) with the polynomial P to 
be chosen in the RKHS M^ of Z. The spaces M^ and H" are known explicitly. 
The space H^ is the set of polynomials P^ = J2f=o Ci^* equipped with the norm 
\\P& = El^o^ ^f- The RKHS H" is the space I^^^^'^{L^[0, 1]) with associated 
norm ||/^+ fWm" = ||/||2/r(Q:+ 1/2), where F is the Gamma function, due to 
Theorem 4.2 in [14]. 

Let us check that for the process X", the small ball term ip^{e) is bounded 
above by a constant times e^^/" for e small enough. Indeed, it is known that for 
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the Ricmann-Liouvillc process i?", the quantity — logP(||i?"||oo < s) behaves 
as a constant times e^^/" as e — > 0, see [10]. Moreover, for any integer k, the 
quantity PdlZfei'^'lJoo < e) behaves as a constant times e, which by independence 
of Z and R" impHes that — logP(||X"|joo < s) is smaller than a constant times 
£-1/" for e small enough. 

Now we study the RKHS-approximation term tpf (e). There are different 
cases depending on the value of {a} which are: {a} G (0, 1/2), {a} e (1/2, 1), 
{a} — and {a} — 1/2. We will focus on the first case, that is {a} G (0, 1/2), 
the other cases being similar. Also, we assume that a> f3, the case a < /? being 
similar though easier since the small ball term dominates in that case. 

Thus we focus on the RKHS-approximation term ip^ {e) in the case where 
a > P and {a} G (0, 1/2). Let (j) he a. smooth, compactly supported kernel, of 
sufSciently large order and for cr > define 4'cr{t) = a~^ (f){t / a) . Since /o G C, 
we have ||/o-/o*0.||oo < (jf" thus \\{h-P}-{h*^a-P}\\oo < e if a = Ce^'P 
for some constant C . 

Let us write Taylor's theorem in the form 



a 



(/o*0<T)'''HO)^fc , jC+l/2jl/2-{a},A^) ,(H-£+l) 



fo "^ (Pait) ^ 2^ 1 +/(,+ /o+ '(fo *(pa - ). 

fe=0 

For the polynomial P let us choose the polynomial part in the preceding display. 
Its squared RKHS-norm ||-P||gz is proportional to X]f'=o(/o * </'o-)''^H0)^- The 

term of largest order is (/g * t^o-)^— '(0)^ — fff * (j)^ ~ (0)^. Note that, since /o 
is in C^ , denoting by {/3} the fractional part of /?, 

I/q- *(/>^ -(0)1 = I / {/q- (0 - s) - /q- (O)}0a -{s)ds\ 



< 



\s\^PU^r-\s)ds<af''^. 



Hence ||P||hz ^ a'^0~'^^ < cr-i-2a+2/3_ ^^^ notice that h * <j)a - P belongs to 

H" and has RKHS-norm proportional to II /q^ (/o~ * 0o~ ~ )||2. Thus, in 

the case where 1/2 — {a] + {/3} ^ 1, we can use Lemma 6 to get 

||/o+ (/o~ "'t'a - )l|2 



Thus ||/o*0<t-P||h» S cr-i-2"+2/3 < £-(2a-2/3-Hi)//3^ The Small baU term (/j^ be- 
ing of smaller order, the concentration function is at most of order e^(2"~2/3+i)//3^ 
which concludes the proof in this case. 

If 1/2 — {a} + {/?} = 1, let us apply Lemma 7 to obtain 

ujl/2-{a},^P) ,(a-0+l)s||2 <^ -2Q+2/3-1 [ 2,(q-;3+1)/ xr, ,, 2/, , 1 xi , 
||/o+ (/o~ *<?^T - )||2^Cr -^ ii / W 0^- ^^ ^(w){l+log (1 + ^ r)}dw. 



°+ -" ■-■- ' • ■■ — \av 
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Using the inequality 1 + (|(tw|)^^ < cr^^(l + |u|^^) valid for < cr < 1, one 
obtains that the norm oi fo * (p^r — P in H" is bounded by constant times 
log(cr~^)CT~^/^~""'"'', which concludes the proof. D 

4-2. Proof of the Lemmas 

Proof of Lemma 3. The concentration function tp^^ is the sum tp^ + Lp^ of two 
decreasing functions, see Equation (3). Let us show that ip^ is strictly decreasing 
that is ip^{e) > f^{e') if e' > e. It suffices to see that C = {7 e B, e < |J7|| < e'} 
receives positive mass under the law of Z. Since Z is non-degenerate, its RKHS 
IH contains a non-zero element hi. For some A > 0, the element A/7,1 G H belongs 
to the set C. Since C is an open set, there exists 77 > such that the ball B{Xhi, rj) 
is included in C. Thus to obtain the strict monotonicity, it suffices to show that 
the probability P(||.Z — /ij| < 77) of an arbitrary open ball centered around an 
element of /i S HI is positive. 

This can be proved as follows. Using Cameron-Martin change of variables 
formula, see e.g. Lemma 3.2 in [i'l], it suffices to prove that any ball centered 
at zero of positive radius receives positive mass under the law of Z. Since B is 
separable, for any r > the space B is the union of a countable number of balls 
of radius r thus at least one of these balls say B{x, r) receives positive mass. 
Let Z' be an independent copy of the process Z, we have 

< P{Z G B{x, r))P{Z' e B{x, r)) < P{Z - Z' e B{Q, 2r)). 

Now note that {Z — Z')/V2 has the same distribution as Z (these are Gaussian 
processes with the same covariance function) thus P(Z e B{Q,r)) is positive for 
any r > 0. 

Now just the convexity statement remains to prove. Using the fact that the 
function h -^ ||/i||h is convex together with the definition of the infimum, one 
gets that (pf is convex. The fact that ip^ is convex is a consequence of the 
general fact that the probability measure of a mean-zero Gaussian process is 
log-concave, see for instance Lemma 1.1 in [4]. D 

Proof of Lemma 4- First, let us check that for any h in the RKHS H of X", 
it holds ||/i||g = /i(0)^ + \\h - /i(0)||h. We use the well-known fact that if a 
process X is a sum of two independent centered Gaussian components V and 
W, with supports B^ and B^ and RKHS H^ and H^ respectively, such that 
B^ n B^ — {0} and B^ is complemented by a closed subspace that contains 
B^ , then the RKHS H of X is the direct sum of H^ and H^ and 1 1 /i^ 4- /i'^ 1 1 i, = 
||/i^||gv + ll^'^lle"'! s'^e for instance Lemma 9.1 in [15]. We apply this fact to the 
decomposition X" ~ V + W, with V = Zq and W = X" — Zq, see Equation (9). 
The support B^ is the set of all constant functions, while B^ is included in the 
(closed) set of all continuous functions / with /(O) = 0. Since 1^ n B^ = {0}, 
the preceding result implies the announced decomposition. 
Now note that 

. w ./'''^ „ I1'^I1h= „ i^f „ \\9 + P\\l 

fi£H, ||/i — u)o— p||oo<e ffelHl, \\g — u)o||oo<£ 
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For any g belonging to the set defining the latter infimuni, 

II.9 + pIIh = \\9 + P-9iO)-p\\l + igiO)+pf 

= 11.9 - 5(0)11^ + (g(0) + pf - ||5||^ + 2g{0)p + p\ 

Since \\g — wo\\oc < £ in particular we have \g{0) — i(Jo(0)| < e, which gives the 
desired bound on the infimuni and hence on the concentration function. D 

Proof of Lemma 7. From the proof of Theorem 14 in [7] [p. 588], we know that 
for any < t < 1 and < u < t, it holds 

|/o+'/(t -u)- l'o+'f{t)\ <u + uj " w'{{w - I)-' - w-'}dw. 

Since 6 G (0, 1), the latter integral is bounded if t/u < 2. If t/u > 2 we split 
the integral in a part over [1, 2], which is bounded, and a part over [2, t/u]. For 
the latter part, the mean value theorem gives \{w — 1)^^ — w~^\ < (w — 1)^*^^. 
Thus using the inequality w < 2(w — 1) for w > 2, we obtain that the integrand 
is bounded from above by {w — 1)^^, which leads to 

\Io+'fit - ") - Io+'fit)\ < "(1 + log(l + tM) (10) 

But this also holds for t < u since then by definition Iq^^ f{t — u) = and 
we can use the preceding display with t — u to get that |/q^ f{t)\ < t < 
u{l + log(l + t/u)}. Thus using Fubini's theorem and then (10), one obtains 
that for any i > and any real w, 

\Io+Hf * 9m\ < f\I^+'f{t-u)-l',-'f{t)\\g{u)\du 
< I \u\{l + log{l + t/\u\)}\g{u)\du. 
Hence by the Cauchy-Schwarz inequality 

\\li+\f * 9)\\l < f^^j{l + \og{l+t/\u\)Yu'g{ufd^dt 



< fu^l + log'il + \u\~^)}g{ufdu. n 



5. Conclusion 



We have defined a notion of lower bound for the rate of convergence of the poste- 
rior distribution and given a scheme to obtain lower bounds in a nonparametric 
framework when the prior is a Gaussian process. Lower and upper bound rates 
turn out to be intimately related to the behavior of the concentration function 
(/3/o of the Gaussian process at the true /q. When /o is smooth enough, the 
small ball term in ipf^ dominates and determines the rate. On the contrary, 
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when the prior is much smoother than the function, the RKHS-approximation 
term dominates and in general some extra information on /o is needed in order 
to determine the precise behavior of ipf„ expUcitly. In the framework of Section 
3.1 we were able to obtain that known upper bound rates are, up to constants or 
log factors, also lower bounds rates, thus leading to optimality of these rates up 
to constants or log factors. In Section 3.2 we have obtained lower bound results 
for the posterior rate when the prior is itself non-Gaussian (though constructed 
from a Gaussian prior) using Lemma 1 directly. Since the proof of Theorem 1 on 
Gaussian priors also relies on this result, our work also underlines the usefulness 
of Lemma 1 in obtaining lower bound results. 
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